Acquiring Plausible Uni cation-Based Grammars using Model-Based and Data-Driven Learning

نویسنده

  • Miles Osborne
چکیده

Undergeneration is a problem that undermines successful parsing of unrestricted texts. A popular solution to this problem is automatic grammar correction (or machine learning of grammar). Broadly speaking, grammar correction approaches can be classiied as being either data-driven, or model-based. Data-driven learners use data-intensive methods to acquire grammar. They typically use grammar formalisms unsuited to the needs of practical text processing. That is, data-driven learners acquire grammars that overgen-erate and fail to assign linguistically plausible parses. Model-based learners are knowledge-intensive and are reliant for success upon the completeness of a model of grammaticality. But, in practice, the model will be incomplete and so since we deal with undergeneration by learning, we hypothesise that the combined use of data-driven and model-based learning would allow data-driven learning to compensate for model-based learning's incompleteness, whilst model-based learning would compensate for data-driven learning's inadequacy. We describe a system that we have used to test this hypothesis empirically. The system combines data-driven and model-based learning to acquire uniication-based grammars that are more suitable for practical text parsing. The system is used to illustrate the strengths and weaknesses of both learning styles. Finally, using the Spoken English Corpus as data, and by quantitatively measuring undergeneration, overgeneration and parse plaus-ibility, we show that the hypothesis is promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Uniication-based Grammars Using the Spoken English Corpus

This paper describes a grammar learning system that combines model-based and data-driven learning within a single framework. Our results from learning grammars using the Spoken English Corpus (SEC) suggest that combined model-based and data-driven learning can produce a more plausible grammar than is the case when using either learning style in isolation.

متن کامل

Learning Unification-Based Grammars Using the Spoken English Corpus

This paper describes a grammar learning system that combines model-based and data-driven learning within a single framework. Our results from learning grammars using the Spoken English Corpus (SEC) suggest that combined model-based and data-driven learning can produce a more plausible grammar than is the case when using either learning style in isolation.

متن کامل

Learning Uniication-based Grammars and the Treatment of Undergeneration

We present a framework for learning plausible uniication-based natural language grammars. Our framework uses both model-based and data-driven learning without being committed to any particular connguration of these two learning schemes. We use learning to overcome the problem of undergeneration in natural language grammars. This paper presents work that is still in progress: the model-based lea...

متن کامل

Learning unification-based natural language grammars

Practical text processing systems need wide covering grammars. When parsing unrestricted language, such grammars often fail to generate all of the sentences that humans would judge to be grammatical. This problem undermines successful parsing of the text and is known as undergeneration. There are two main ways of dealing with undergeneration: either by sentence correction, or by grammar correct...

متن کامل

Formalization and Parsing of Typed Unification-Based ID/LP Grammars

This paper de nes uni cation based ID/LP grammars based on typed feature structures as nonterminals and proposes a variant of Earley's algorithm to decide whether a given input sentence is a member of the language generated by a particular typed uni cation ID/LP grammar. A solution to the problem of the nonlocal ow of information in uni cation ID/LP grammars as mentioned in Sei ert (1991) is in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995